Goto

Collaborating Authors

 ln 2



Supplementary material for Discrete Valued Neural Communication in Structured Architectures Enhances Generalization

Neural Information Processing Systems

In this appendix, as a complementary to Theorems 1-2, we provide additional theorems, Theorems 3-4, which further illustrate the two advantages of the discretization process by considering an abstract model with the discretization bottleneck. For the advantage on the sensitivity, the error due to potential noise and perturbation without discretization -- the third term ฮพ(w,r0,M0,d) >0 in Theorem 4 -- is shown to be minimized to zero with discretization in Theorems 3. See Appendix C.1 for a simple comparison between the bound of Theorem 3 and that of Theorem 4 when the metric spaces (M,d) and (M0,d0) are chosen to be Euclidean spaces. We now introduce the notation used in Theorems 3-4. Here, ฯ•w represents a deep neural network with weight parameters w W RD, qe is the discretization process with the codebook e E RL m, and hฮธ represents a deep neural network with parameters ฮธ ฮ˜ Rฮถ. Thus, the tuple of all learnable parameters are (w,e,ฮธ).